Source file ⇒ The_Analytics_Edge_edX_MIT15.071x_June2015_4.rmd
These are my notes for the lectures of the The_Analytics_Edge_edX_MIT15.0 71x_June2015“ by Professor Dimitris Bertsimas. The goal of these notes is to provide the reproducible R code for all the lectures.
A good list of resources about using R for Text Analytics are given below:
NOTE:I have gone head with some summary outputs which i could have restrained from doing, but the main intention was to see whether the function was performing as desired as there were some issues related to tm package version as used in the lecture and the latest version.
We will be trying to understand sentiment of tweets about the company Apple.
While Apple has a large number of fans, they also have a large number of people who don’t like their products. They also have several competitors.
To better understand public perception, Apple wants to monitor how people feel over time and how people receive new announcements.
Our challenge in this lecture is to see if we can correctly classify tweets as being negative,positive, or neither about Apple.
The Data
To collect the data needed for this task, we had to perform two steps.
The first was to collect data about tweets from the internet.
Twitter data is publicly available, and it can be collected it through scraping the website or via the Twitter API.
The sender of the tweet might be useful to predict sentiment, but we will ignore it to keep our data anonymized.
So we will just be using the text of the tweet.
Then we need to construct the outcome variable for these tweets, which means that we have to label them as positive, negative, or neutral sentiment.
We would like to label thousands of tweets, and we know that two people might disagree over the correct classification of a tweet. To do this efficiently, one option is to use the Amazon Mechanical Turk.
The task that we put on the Amazon Mechanical Turk was to judge the sentiment expressed by the following item toward the software company Apple.
The items we gave them were tweets that we had collected. The workers could pick from the following options as their response:
These outcomes were represented as a number on the scale from -2 to 2.
Each tweet was labeled by five workers. For each tweet, we take the average of the five scores given by the five workers, hence the final scores can range from -2 to 2 in increments of 0.2.
The following graph shows the distribution of the number of tweets classified into each of the categories. We can see here that the majority of tweets were classified as neutral, with a small number classified as strongly negative or strongly positive.
distribution of score
So now we have a bunch of tweets that are labeled with their sentiment. But how do we build independent variables from the text of a tweet to be used to predict the sentiment?
A Bag of Words
One of the most used techniques to transforms text into independent variables is that called Bag of Words.
Fully understanding text is difficult, but Bag of Words provides a very simple approach: it just counts the number of times each word appears in the text and uses these counts as the independent variables.
For example, in the sentence,
"This course is great. I would recommend this course to my friends,"
the word this is seen twice, the word course is seen twice, the word great is seen once, et cetera.
bag of words
In Bag of Words, there is one feature for each word. This is a very simple approach, but is often very effective, too. It is used as a baseline in text analytics projects and for Natural Language Processing.
This is not the whole story, though. Preprocessing the text can dramatically improve the performance of the Bag of Words method.
Cleaning Up Irregularities
One part of preprocessing the text is to clean up irregularities.
Text data often as many inconsistencies that will cause algorithms trouble. Computers are very literal by default.
One common irregularity concerns the case of the letters, and it is customary to change all words to either lower-case or upper-case.
Punctuation also causes problems, and the basic approach is to remove everything that is not a letter. However some punctuation is meaningful, and therefore the removal of punctuation should be tailored to the specific problem.
There are also unhelpful terms:
Stemming: This step is motivated by the desire to represent words with different endings as the same word. We probably do not need to draw a distinction between argue, argued, argues, and arguing. They could all be represented by a common stem, argu. The algorithmic process of performing this reduction is called stemming.
There are many ways to approach the problem.
This second approach is widely popular and is called the Porter Stemmer, designed by Martin Porter in 1980, and it’s still used today.
QUICK QUESTION
Which of these problems is the LEAST likely to be a good application of natural language processing?
Ans:Judging the winner of a poetry contest
EXPLANATION:Judging the winner of a poetry contest requires a deep level of human understanding and emotion. Perhaps someday a computer will be able to accurately judge the winner of a poetry contest, but currently the other three tasks are much better suited for natural language processing.
QUICK QUESTION
For each tweet, we computed an overall score by averaging all five scores assigned by the Amazon Mechanical Turk workers. However, Amazon Mechanical Turk workers might make significant mistakes when labeling a tweet. The mean could be highly affected by this.
Which of the three alternative metrics below would best capture the typical opinion of the five Amazon Mechanical Turk workers, would be less affected by mistakes, and is well-defined regardless of the five labels?
Ans:An overall score equal to the median (middle) score
EXPLANATION:The correct answer is the first one - the median would capture the typical opinion of the workers and tends to be less affected by significant mistakes. The majority score might not have given a score to all tweets because they might not all have a majority score (consider a tweet with scores 0, 0, 1, 1, and 2). The minimum score does not necessarily capture the typical opinion and could be highly affected by mistakes (consider a tweet with scores -2, 1, 1, 1, 1).
QUICK QUESTION
For each of the following questions, pick the preprocessing task that we discussed in the previous video that would change the sentence “Data is useful AND powerful!” to the new sentence listed in the question.
New sentence: Data useful powerful!
Ans:Removing stop words
New sentence: data is useful and powerful
Ans:Cleaning up irregularities (changing to lowercase and removing punctuation)
New sentence: Data is use AND power!
Ans:Stemming
EXPLANATION:The first new sentence has the stop words “is” and “and” removed. The second new sentence has the irregularities removed (no capital letters or punctuation). The third new sentence has the words stemmed - the “ful” is removed from “useful” and “powerful
Sys.setlocale("LC_ALL", "C")
## [1] "C"
# Unit 5 - Twitter
# VIDEO 5
#LOADING AND PROCESSING DATA IN R
tweets = read.csv("tweets.csv", stringsAsFactors=FALSE)
#Note: when working on a text analytics problem it is important (necessary!) to add the extra argument stringsAsFactors = FALSE, so that the text is read in properly.
#Let's take a look at the structure of our data:
str(tweets)
## 'data.frame': 1181 obs. of 2 variables:
## $ Tweet: chr "I have to say, Apple has by far the best customer care service I have ever received! @Apple @AppStore" "iOS 7 is so fricking smooth & beautiful!! #ThanxApple @Apple" "LOVE U @APPLE" "Thank you @apple, loving my new iPhone 5S!!!!! #apple #iphone5S pic.twitter.com/XmHJCU4pcb" ...
## $ Avg : num 2 2 1.8 1.8 1.8 1.8 1.8 1.6 1.6 1.6 ...
#We have 1181 observations of 2 variables:
##Tweet: the text of the tweet.
##Avg: the average sentiment score.
#The tweet texts are real tweets that gathered on the internet directed to Apple with a few cleaned up words.We are more interested in being able to detect the tweets with clear negative sentiment, so let's define a new variable in our data set called Negative.
#equal to TRUE if the average sentiment score is less than or equal to -1
#equal to FALSE if the average sentiment score is greater than -1.
# Create dependent variable
tweets$Negative = as.factor(tweets$Avg <= -1)
table(tweets$Negative)
##
## FALSE TRUE
## 999 182
#Now to pre process our text data so that we could we could use the 'Bag of words' approach , we will be using the'tm'-- text mining package
#install.packages("tm")
library(tm)
#install.packages("SnowballC")
library(SnowballC)
#One of the concepts introduced by tm package is that of a corpus.A corpus is a collection of documents.We need to convert our tweets into corpus for pre processing.
#Various function in the tm package can be used to create a corpus in many different ways.We will create it from the tweet column of our data frame using two functions, Corpus() and VectorSource(). We feed to this latter the Tweets variable of the tweets data frame.
# Create corpus
corpus = Corpus(VectorSource(tweets$Tweet))
# Look at corpus
corpus
## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 0
## Content: documents: 1181
#We can check that the documents match our tweets by using double brackets [[.
#To inspect the first (or 10th) tweet in our corpus, we select the first (or 10th) element as:
attributes(corpus[[1]])
## $names
## [1] "content" "meta"
##
## $class
## [1] "PlainTextDocument" "TextDocument"
corpus[[1]]$content
## [1] "I have to say, Apple has by far the best customer care service I have ever received! @Apple @AppStore"
corpus[[10]]$content
## [1] "Just checked out the specs on the new iOS 7...wow is all I have to say! I can't wait to get the new update ?? Bravo @Apple"
# IMPORTANT NOTE: If you are using the latest version of the tm package, you will need to run the following line before continuing (it converts corpus to a Plain Text Document). This is a recent change having to do with the tolower function that occurred after this video was recorded.
corpus = tm_map(corpus, PlainTextDocument)
#Converting text to lower case
#Pre-processing is easy in tm.
#Each operation, like stemming or removing stop words, can be done with one line in R, where we use the tm_map() function which takes as its first argument the name of a corpus and as second argument a function performing the transformation that we want to apply to the text.
#To transform all text to lower case:
corpus = tm_map(corpus, content_transformer(tolower))
#Checking the same two "documents" as before:
corpus[[1]]$content
## [1] "i have to say, apple has by far the best customer care service i have ever received! @apple @appstore"
corpus[[10]]$content
## [1] "just checked out the specs on the new ios 7...wow is all i have to say! i can't wait to get the new update ?? bravo @apple"
# Removing punctuation
corpus = tm_map(corpus, removePunctuation)
corpus[[1]]$content
## [1] "i have to say apple has by far the best customer care service i have ever received apple appstore"
corpus[[10]]$content
## [1] "just checked out the specs on the new ios 7wow is all i have to say i cant wait to get the new update bravo apple"
# Look at stop words provided by tm package.It is necessary to define a list of words that we regard as being stop words, and for this the tm package provides a default list for the English language. We can check it out with:
stopwords("english")[1:10]
## [1] "i" "me" "my" "myself" "we"
## [6] "our" "ours" "ourselves" "you" "your"
length(stopwords("english"))
## [1] 174
#Next we want to remove the stop words in our tweets.
#Removing words can be done with the removeWords argument to the tm_map() function, with an extra argument, i.e. what the stop words are that we want to remove.
#We will remove all of these English stop words, but we will also remove the word "apple" since all of these tweets have the word "apple" and it probably won't be very useful in our prediction problem.
# Removing stopwords and apple
corpus = tm_map(corpus, removeWords, c("apple", stopwords("english")))
corpus[[1]]$content
## [1] " say far best customer care service ever received appstore"
corpus[[10]]$content
## [1] "just checked specs new ios 7wow say cant wait get new update bravo "
#Stemming
#Lastly, we want to stem our document with the stemDocument argument.
# Stem document
corpus = tm_map(corpus, stemDocument)
corpus[[1]]$content
## [1] " say far best custom care servic ever receiv appstor"
corpus[[10]]$content
## [1] "just check spec new io 7wow say cant wait get new updat bravo"
#We can see that this took off the ending of "customer," "service," "received," and "appstore."
##################################
#QUICK QUESTION
#Q:Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?
#Ans:2
#Q:How many commands do you need to run to stem the document?
#Ans:1
#EXPLANATION:In R, you can clean up the irregularities with two lines:
#corpus = tm_map(corpus, tolower)
#corpus = tm_map(corpus, removePunctuation) And you can stem the document with one line:
#corpus = tm_map(corpus, stemDocument)
# Video 6
#Create a Document Term Matrix
#We are now ready to extract the word frequencies to be used in our prediction problem. The tm package provides a function called DocumentTermMatrix() that generates a matrix where:
#the rows correspond to documents, in our case tweets, and
#the columns correspond to words in those tweets.
#The values in the matrix are the number of times that word appears in each document.
corpus = tm_map(corpus, PlainTextDocument)
# Create matrix
frequencies=DocumentTermMatrix(corpus)
frequencies
## <<DocumentTermMatrix (documents: 1181, terms: 3289)>>
## Non-/sparse entries: 8980/3875329
## Sparsity : 100%
## Maximal term length: 115
## Weighting : term frequency (tf)
#We see that in the corpus there are 3289 unique words.
#Let's see what this matrix looks like using the inspect() function, in particular slicing a block of rows/columns from the Document Term Matrix by calling by their indices:
# Look at matrix
inspect(frequencies[1000:1005,505:515])
## <<DocumentTermMatrix (documents: 6, terms: 11)>>
## Non-/sparse entries: 1/65
## Sparsity : 98%
## Maximal term length: 9
## Weighting : term frequency (tf)
##
## Terms
## Docs cheapen cheaper check cheep cheer cheerio cherylcol chief
## character(0) 0 0 0 0 0 0 0 0
## character(0) 0 0 0 0 0 0 0 0
## character(0) 0 0 0 0 0 0 0 0
## character(0) 0 0 0 0 0 0 0 0
## character(0) 0 0 0 0 0 0 0 0
## character(0) 0 0 0 0 1 0 0 0
## Terms
## Docs chiiiiqu child children
## character(0) 0 0 0
## character(0) 0 0 0
## character(0) 0 0 0
## character(0) 0 0 0
## character(0) 0 0 0
## character(0) 0 0 0
#In this range we see that the word "cheer" appears in the tweet 1005, but "cheap" does not appear in any of these tweets. This data is what we call sparse. This means that there are many zeros in our matrix.
#We can look at what the most popular terms are, or words, with the function findFreqTerms(), selecting a minimum number of 20 occurrences over the whole corpus:
# Check for sparsity
findFreqTerms(frequencies, lowfreq=20)
## [1] "android" "anyon" "app"
## [4] "appl" "back" "batteri"
## [7] "better" "buy" "can"
## [10] "cant" "come" "dont"
## [13] "fingerprint" "freak" "get"
## [16] "googl" "ios7" "ipad"
## [19] "iphon" "iphone5" "iphone5c"
## [22] "ipod" "ipodplayerpromo" "itun"
## [25] "just" "like" "lol"
## [28] "look" "love" "make"
## [31] "market" "microsoft" "need"
## [34] "new" "now" "one"
## [37] "phone" "pleas" "promo"
## [40] "promoipodplayerpromo" "realli" "releas"
## [43] "samsung" "say" "store"
## [46] "thank" "think" "time"
## [49] "twitter" "updat" "use"
## [52] "via" "want" "well"
## [55] "will" "work"
#Out of the 3289 words in our matrix, only 56 words appear at least 20 times in our tweets.
#This means that we probably have a lot of terms that will be pretty useless for our prediction model. The number of terms is an issue for two main reasons:
#One is computational: more terms means more independent variables, which usually means it takes longer to build our models.
#The other is that in building models the ratio of independent variables to observations will affect how well the model will generalize.
# Remove sparse terms(removing some terms that don't appear very often.)
sparse = removeSparseTerms(frequencies, 0.995)
#This function takes a second parameters, the sparsity threshold. The sparsity threshold works as follows.
#If we say 0.98, this means to only keep terms that appear in 2% or more of the tweets.
#If we say 0.99, that means to only keep terms that appear in 1% or more of the tweets.
#If we say 0.995, that means to only keep terms that appear in 0.5% or more of the tweets, about six or more tweets.
#Let's see what the new Document Term Matrix properties look like:
sparse
## <<DocumentTermMatrix (documents: 1181, terms: 309)>>
## Non-/sparse entries: 4669/360260
## Sparsity : 99%
## Maximal term length: 20
## Weighting : term frequency (tf)
#It only contains 309 unique terms, i.e. only about 9.4% of the full set.
# Convert sparse to a data frame to use for predictive modeling
tweetsSparse = as.data.frame(as.matrix(sparse))
#Fix variables names in the data frame
#Since R struggles with variable names that start with a number, and we probably have some words here that start with a number, we should run the make.names() function to make sure all of our words are appropriate variable names. It will convert the variable names to make sure they are all appropriate names for R before we build our predictive models. You should do this each time you build a data frame using text analytics.
# Make all variable names R-friendly
colnames(tweetsSparse) = make.names(colnames(tweetsSparse))
# Add dependent variable
#We should add back to this data frame our dependent variable to this data set. We'll call it tweetsSparse$Negative and set it equal to the original Negative variable from the tweets data frame.
tweetsSparse$Negative = tweets$Negative
# Split the data in training/testing sets
library(caTools)
set.seed(123)
split = sample.split(tweetsSparse$Negative, SplitRatio = 0.7)
trainSparse = subset(tweetsSparse, split==TRUE)
testSparse = subset(tweetsSparse, split==FALSE)
#QUICK QUESTION
#In the previous video, we showed a list of all words that appear at least 20 times in our tweets. Which of the following words appear at least 100 times? Select all that apply. (HINT: use the findFreqTerms function)
findFreqTerms(frequencies, lowfreq=100)
## [1] "iphon" "itun" "new"
#Ans:"iphon", "itun", and "new"
# Video 7
# Build a CART model
library(rpart)
library(rpart.plot)
#Let's first use CART to build a predictive model, using the rpart() function to predict Negative using all of the other variables as our independent variables and the data set trainSparse.
#We'll add one more argument here, which is method = "class" so that the rpart() function knows to build a classification model. We keep default settings for all other parameters, in particular we are not adding anything for minbucket or cp.
#Building the classification model with all the IVs
tweetCART = rpart(Negative ~ ., data=trainSparse, method="class")
#plotting the tree
prp(tweetCART)
#The tree says that
#if the word "freak" is in the tweet, then predict TRUE, or negative sentiment.
#If the word "freak" is not in the tweet, but the word "hate" is again predict TRUE.
#If neither of these two words are in the tweet, but the word "wtf" is, also predict TRUE, or negative sentiment.
#If none of these three words are in the tweet, then predict FALSE, or non-negative sentiment.
#This tree makes sense intuitively since these three words are generally seen as negative words.
# Evaluate the Out-of-Sample numerical performance of the model to get class predictions
#sing the predict() function we compute the predictions of our model tweetCART on the new data set testSparse. Be careful to add the argument type = "class" to make sure we get class predictions.
predictCART = predict(tweetCART, newdata=testSparse, type="class")
#computing the confusion matrix from the predictions
cmat_CART<-table(testSparse$Negative, predictCART)
cmat_CART
## predictCART
## FALSE TRUE
## FALSE 294 6
## TRUE 37 18
# Compute accuracy
accu_CART <- (cmat_CART[1,1] + cmat_CART[2,2])/sum(cmat_CART) #(294+18)/(294+6+37+18)=0.8788732
#Ans:Overall accuracy=0.8788732
#Sensitivity = 18 / 55 = 0.3273 ( = TP rate)
#Specificity = 294 / 300 = 0.98
#FP rate = 6 / 300 = 0.02
#Comparison with theBaseline accuracy
#Let's compare this to a simple baseline model that always predicts non-negative (i.e. the most common value of the dependent variable).
#To compute the accuracy of the baseline model, let's make a table of just the outcome variable Negative.
cmat_baseline<-table(testSparse$Negative)
cmat_baseline
##
## FALSE TRUE
## 300 55
accu_baseline <- max(cmat_baseline)/sum(cmat_baseline)#300/(300+55)=08450704
#Ans:Baseline model accuracy=0.8450704
#So our CARTt model does better than the baseline model.Lets see how Random Forest does?
#Random forest model
library(randomForest)
set.seed(123)
#Building the Random forest model with all the IVs (Takes considerably a long time since er have a large no. of IVs)
#We use the randomForest() function to predict Negative again using all of our other variables as independent variables and the data set trainSparse. Again we use the default parameter settings:
tweetRF = randomForest(Negative ~ ., data=trainSparse)
# Make Out-of-Sample predictions:
predictRF = predict(tweetRF, newdata=testSparse)
#computing the confusion matrix
cmat_RF<-table(testSparse$Negative, predictRF)
cmat_RF
## predictRF
## FALSE TRUE
## FALSE 293 7
## TRUE 34 21
#Overall model Accuracy:
accu_RF <- (cmat_RF[1,1] + cmat_RF[2,2])/sum(cmat_RF)
accu_RF #(293+21)/(293+7+34+21)=0.884507
## [1] 0.884507
#The overall accuracy of this Random Forest model is 0.884507
#The accuracy is slightly better than the CART model, but the interpretability of CART model is more compared to Random Forest and hence probably i would use the CART model
#If you were to use cross-validation to pick the cp parameter for the CART model, the accuracy would increase to about the same as the random forest model.So by using a bag-of-words approach and these models, we can reasonably predict sentiment even with a relatively small data set of tweets.
##################################
#QUICK QUESTION
#Comparison with logistic regression model
#In the previous video, we used CART and Random Forest to predict sentiment. Let's see how well logistic regression does. Build a logistic regression model (using the training set) to predict "Negative" using all of the independent variables. You may get a warning message after building your model - don't worry (we explain what it means in the explanation).
#Build the model, using all independent variables as predictors:
tweetLog<- glm(Negative ~ . , data =trainSparse, family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#summary(tweetLog)
#Now, make predictions on the testing set using the logistic regression model:
predictLog= predict(tweetLog, newdata=testSparse, type="response")
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
#where "tweetLog" should be the name of your logistic regression model. You might also get a warning message after this command, but don't worry - it is due to the same problem as the previous warning message.
#Build a confusion matrix (with a threshold of 0.5) and compute the accuracy of the model.What is the accuracy?
# Confusion matrix with threshold of 0.5
cmat_log<-table(testSparse$Negative, predictLog> 0.5)
cmat_log
##
## FALSE TRUE
## FALSE 253 47
## TRUE 22 33
#lets now compute the overall accuracy
accu_log <- (cmat_log[1,1] + cmat_log[2,2])/sum(cmat_log)
accu_log #(253+33)/(253+47+22+33) = 0.8056338
## [1] 0.8056338
#Ans:0.8056338
#EXPLANATION:The accuracy is (253+33)/(253+47+22+33) = 0.8056338, which is worse than the baseline.
#The Perils of Over-fitting:
#If you were to compute the accuracy on the training set instead, you would see that the model does really well on the training set - this is an example of over-fitting. The model fits the training set really well, but does not perform well on the test set. A logistic regression model with a large number of variables is particularly at risk for overfitting.
#Note that you might have gotten a different answer than us, because the glm function struggles with this many variables. The warning messages that you might have seen in this problem have to do with the number of variables, and the fact that the model is overfitting to the training set. We'll discuss this in more detail in the Homework Assignment.
THE ANALYTICS EDGE
How IBM Built a Jeopardy! Champion
A Grand Challenge
Why was everyone so interested?
A Tradition of Challenges
The Challenge Begins
The Contestants
Ken_Jennings
Brad_Rutter
Watson
The Match Begins
match begins!
QUICK QUESTION
What were the goals of IBM when they set out to build Watson? Select all that apply.
Ans:To build a computer that could compete with the best human players at Jeopardy!.& To build a computer that could answer questions that are commonly believed to require human intelligence.
EXPLANATION:The main goals of IBM were to build a computer that could answer questions that are commonly believed to require human intelligence, and to therefore compete with the best human players at Jeopardy!.
Overview of the Jeopardy! game
jeopardy
ExampleRound
Example_Round
Jeopardy! Questions
QUICK QUESTION
For which of the following reasons is Jeopardy! challenging? Select all that apply.
Ans:A wide variety of categories. , Speed is required - you have to buzz in faster than your competitors. , The categories and clues are often cryptic.
EXPLANATION:Jeopardy! is challenging because there are a wide variety of categories, speed is required, and the categories and clues are cryptic. Expert knowledge is not generally required.
Why is Jeopardy Hard?
A Straightforward Approach
Using Analytics
Watson’s Database and Tools
How Watson Works
QUICK QUESTION
Which of the following two questions do you think would be EASIEST for a computer to answer?
Ans:What year was Abraham Lincoln born?
EXPLANATION:The second question would be the easiest, because the answer is a fact. The first question is much more subjective.
Step 1: Question Analysis
Step 1: Question Analysis
Step 2: Hypothesis Generation
QUICK QUESTION
Select the LAT of the following Jeopardy question: NICHOLAS II WAS THE LAST RULING CZAR OF THIS ROYAL FAMILY (Hint: The answer is “The Romanovs”)
Ans:THIS ROYAL FAMILY
Select the LAT of the following Jeopardy question: REGARDING THIS DEVICE, ARCHIMEDES SAID, “GIVE ME A PLACE TO STAND ON, AND I WILL MOVE THE EARTH” (Hint: The answer is “A lever”)
Ans: THIS DEVICE
EXPLANATION:The LAT in the first question is “THIS ROYAL FAMILY” and the LAT in the second question is “THIS DEVICE”. Remember that if you replace the LAT with the correct answer, the sentence should make sense.
Step 3: Scoring Hypotheses
Lightweight Scoring Algorithms
Scoring Analytics
Passage Search
Passage Search
Passage_Search_diff
Scoring Analytics
Step 4: Final Merging and Ranking
Ranking and Confidence Estimation
The Watson System